Semi-Automatic Annotation of Natural Language Vulnerability Reports
نویسندگان
چکیده
Those who do not learn from past vulnerabilities are bound to repeat it. Consequently, there have been several research efforts to enumerate and categorize software weaknesses that lead to vulnerabilities. The Common Weakness Enumeration (CWE) is a community developed dictionary of software weakness types and their relationships, designed to consolidate these efforts. Yet, aggregating and classifying natural language vulnerability reports with respect to weakness standards is currently a painstaking manual effort. In this paper, the authors present a semi-automated process for annotating vulnerability information with semantic concepts that are traceable to CWE identifiers. The authors present an information-processing pipeline to parse natural language vulnerability reports. The resulting terms are used for learning the syntactic cues in these reports that are indicators for corresponding standard weakness definitions. Finally, the results of multiple machine learning algorithms are compared individually as well as collectively to semi-automatically annotate new vulnerability reports. Semi-Automatic Annotation of Natural Language Vulnerability Reports
منابع مشابه
On Designing Controlled Natural Languages for Semantic Annotation
Manual semantic annotation is a complex and arduous task both time-consuming and costly often requiring specialist annotators. (Semi)-automatic annotation tools attempt to ease this process by detecting instances of classes within text and relationships between classes, however their usage often requires knowledge of Natural Language Processing(NLP) and/or formal ontological descriptions. This ...
متن کاملWide-Coverage Grammar Extraction from Thai Treebank
Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators br...
متن کاملOBA: Supporting Ontology-Based Annotation of Natural Language Resources
In this paper, we introduce OBA – an application for NLP-based annotation of natural language texts with ontology classes and relations. OBA provides support for different tasks required for semi-automatic semantic annotation. Among other things, it supports creating manual semantic annotations in order to enrich the set of lexical patterns, automatically annotating large corpora based on speci...
متن کاملSemi-automatic compound nouns annotation for data integration systems
Lexical annotation is the explicit inclusion of the “meaning” of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotator tools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationship...
متن کاملFast semi-automatic semantic annotation for spoken dialog systems
This paper describes a bootstrapping methodology for semi– automatic semantic annotation of a “mini–corpus” that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s) from the univ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJSSE
دوره 4 شماره
صفحات -
تاریخ انتشار 2013